Written by: Fariborz Norouzi

Introduction:

The phenomenon of population growth and urban development causes traffic congestions and environmental pollution, and consequently many economic and social impacts. This is one of the most important problems and dilemmas of urban communities. The use of efficient and effective public transportation systems is considered as one of the best solutions to address these problems among city planners and managers. Analysis of public transit data gives us a good picture of the current state and efficiency of the public transport system.

Generally, there are three main kinds of public transit data which comprised:

In this sample for simplicity, we just consider scheduling with fixed data.

Public transportation agencies generate a massive volume of standard data which named General Transit Feed Specification (GTFS). GTFS data are most popular open standard for encoding scheduling data. It is developed in 2005 by Portland TriMet transit agency and Google Company. Many public transit agencies like Sacramento Regional Transit (SACRT) publicly publish their schedules GTFS usually every three months. GTFS files defines as comma separate value (csv file). GTFS feeds contain fixed data for scheduled transit service consisting stops, stop times, calendar, trips and route locations, with schedules information.

GTFS Data Model:

Nowadays, the General Transit Feed Specification (GTFS) data format is used by too many of public transport providers around the globe. In this section, we are going to review the basic component of GTFS data. GTFS dataset is a series of comma-delimited text file with specific schema. The files are related to one another by ID fields. Some of the files are required and some optional. Following picture shows data model of GTFS data that indicates Logical ERDs (Entity Relationship Diagram). ERDs is a graphical representation that depicts relationships among diffrent files such stops, stop times, calendar, route, trips and shape files. Indeed, the geographic components of a GTFS dataset represent how vehicles travel through the system. Also, Calendar, Calendar dates and stop times text files describe the transit schedules.

Logical ERDs of GTFS dataset

gtfs_data_model.png

Analysis of GTFS data with Python - Case study: Sacramento Regional Transit District

I'm using the Python programming language in the Jupyter notebook application with GTFSTK for analyzing GTFS data and generating sample ad-hoc reporting. GTFSTK is a Python 3.6+ tool kit for analyzing GTFS data in memory without a database. It uses Pandas and Shapely to do the heavy lifting.

Before publishing, GTFS feeds should be validated in order to catch errors.

This query shows some detail trips information based on 5024 trips every day (.T indicates transpose dataset).

The following datasets filter can be viewed based on different scenarios.

Creating interactive inline Jupyter map of Sacramento Transit Route

I used folium which it is a powerful Python library to create interactive map. Users can easily zoom in and out on map. In addition, by clicking on each circle which indicates bus stop location; users can retrieve bus stop information and also by clicking on each line coresponding route information is visible.

By hovering pointer to top right corner of map, you can see all active list of transit routes that shown on map. users by uncheck or check box can quickly remove or add route on map.